from google.colab import files
uploaded = files.upload()
Saving energy_dataset.csv to energy_dataset.csv
#import the pandas libraries and assign as pd
import pandas as pd
#read the csv data file called"energy_dataset.csv"
energy_df=pd.read_csv("energy_dataset.csv")
# Display the contents of the DataFrame "energy_df"
energy_df
| time | generation biomass | generation fossil brown coal/lignite | generation fossil coal-derived gas | generation fossil gas | generation fossil hard coal | generation fossil oil | generation fossil oil shale | generation fossil peat | generation geothermal | ... | generation waste | generation wind offshore | generation wind onshore | forecast solar day ahead | forecast wind offshore eday ahead | forecast wind onshore day ahead | total load forecast | total load actual | price day ahead | price actual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 00:00:00+01:00 | 447.0 | 329.0 | 0.0 | 4844.0 | 4821.0 | 162.0 | 0.0 | 0.0 | 0.0 | ... | 196.0 | 0.0 | 6378.0 | 17.0 | NaN | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 |
| 1 | 2015-01-01 01:00:00+01:00 | 449.0 | 328.0 | 0.0 | 5196.0 | 4755.0 | 158.0 | 0.0 | 0.0 | 0.0 | ... | 195.0 | 0.0 | 5890.0 | 16.0 | NaN | 5856.0 | 24934.0 | 24382.0 | 48.10 | 64.92 |
| 2 | 2015-01-01 02:00:00+01:00 | 448.0 | 323.0 | 0.0 | 4857.0 | 4581.0 | 157.0 | 0.0 | 0.0 | 0.0 | ... | 196.0 | 0.0 | 5461.0 | 8.0 | NaN | 5454.0 | 23515.0 | 22734.0 | 47.33 | 64.48 |
| 3 | 2015-01-01 03:00:00+01:00 | 438.0 | 254.0 | 0.0 | 4314.0 | 4131.0 | 160.0 | 0.0 | 0.0 | 0.0 | ... | 191.0 | 0.0 | 5238.0 | 2.0 | NaN | 5151.0 | 22642.0 | 21286.0 | 42.27 | 59.32 |
| 4 | 2015-01-01 04:00:00+01:00 | 428.0 | 187.0 | 0.0 | 4130.0 | 3840.0 | 156.0 | 0.0 | 0.0 | 0.0 | ... | 189.0 | 0.0 | 4935.0 | 9.0 | NaN | 4861.0 | 21785.0 | 20264.0 | 38.41 | 56.04 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 35059 | 2018-12-31 19:00:00+01:00 | 297.0 | 0.0 | 0.0 | 7634.0 | 2628.0 | 178.0 | 0.0 | 0.0 | 0.0 | ... | 277.0 | 0.0 | 3113.0 | 96.0 | NaN | 3253.0 | 30619.0 | 30653.0 | 68.85 | 77.02 |
| 35060 | 2018-12-31 20:00:00+01:00 | 296.0 | 0.0 | 0.0 | 7241.0 | 2566.0 | 174.0 | 0.0 | 0.0 | 0.0 | ... | 280.0 | 0.0 | 3288.0 | 51.0 | NaN | 3353.0 | 29932.0 | 29735.0 | 68.40 | 76.16 |
| 35061 | 2018-12-31 21:00:00+01:00 | 292.0 | 0.0 | 0.0 | 7025.0 | 2422.0 | 168.0 | 0.0 | 0.0 | 0.0 | ... | 286.0 | 0.0 | 3503.0 | 36.0 | NaN | 3404.0 | 27903.0 | 28071.0 | 66.88 | 74.30 |
| 35062 | 2018-12-31 22:00:00+01:00 | 293.0 | 0.0 | 0.0 | 6562.0 | 2293.0 | 163.0 | 0.0 | 0.0 | 0.0 | ... | 287.0 | 0.0 | 3586.0 | 29.0 | NaN | 3273.0 | 25450.0 | 25801.0 | 63.93 | 69.89 |
| 35063 | 2018-12-31 23:00:00+01:00 | 290.0 | 0.0 | 0.0 | 6926.0 | 2166.0 | 163.0 | 0.0 | 0.0 | 0.0 | ... | 287.0 | 0.0 | 3651.0 | 26.0 | NaN | 3117.0 | 24424.0 | 24455.0 | 64.27 | 69.88 |
35064 rows × 29 columns
#creating the energy_loss column to the dataframe "energy_df"
# This line calculates the difference between the "total load forecast" column and the "total load actual" column
energy_df["energy_loss"]=energy_df["total load forecast"]-energy_df["total load actual"]
# Assigns the calculated series of energy loss values to a new column named "energy_loss" in the DataFrame energy_df
energy_df["energy_loss"]
0 733.0
1 552.0
2 781.0
3 1356.0
4 1521.0
...
35059 -34.0
35060 197.0
35061 -168.0
35062 -351.0
35063 -31.0
Name: energy_loss, Length: 35064, dtype: float64
#Display the list of columns names
energy_df.columns
Index(['time', 'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil coal-derived gas', 'generation fossil gas',
'generation fossil hard coal', 'generation fossil oil',
'generation fossil oil shale', 'generation fossil peat',
'generation geothermal', 'generation hydro pumped storage aggregated',
'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation marine',
'generation nuclear', 'generation other', 'generation other renewable',
'generation solar', 'generation waste', 'generation wind offshore',
'generation wind onshore', 'forecast solar day ahead',
'forecast wind offshore eday ahead', 'forecast wind onshore day ahead',
'total load forecast', 'total load actual', 'price day ahead',
'price actual', 'energy_loss'],
dtype='object')
from google.colab import files
uploaded = files.upload()
Saving weather_features.csv to weather_features.csv
# Use the pandas function "read_csv()" to read the contents of the CSV file "weather_features.csv"
# The function returns a DataFrame containing the data from the CSV file
weather_df=pd.read_csv("weather_features.csv")
#Assigns the result of the merge operation to a new DataFrame named final.
final=weather_df.merge(energy_df,how="inner",left_on="dt_iso",right_on="time")
# display a summary of information about the DataFrame's structure and contents.
final.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 178396 entries, 0 to 178395 Data columns (total 47 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 dt_iso 178396 non-null object 1 city_name 178396 non-null object 2 temp 178396 non-null float64 3 temp_min 178396 non-null float64 4 temp_max 178396 non-null float64 5 pressure 178396 non-null int64 6 humidity 178396 non-null int64 7 wind_speed 178396 non-null int64 8 wind_deg 178396 non-null int64 9 rain_1h 178396 non-null float64 10 rain_3h 178396 non-null float64 11 snow_3h 178396 non-null float64 12 clouds_all 178396 non-null int64 13 weather_id 178396 non-null int64 14 weather_main 178396 non-null object 15 weather_description 178396 non-null object 16 weather_icon 178396 non-null object 17 time 178396 non-null object 18 generation biomass 178301 non-null float64 19 generation fossil brown coal/lignite 178306 non-null float64 20 generation fossil coal-derived gas 178306 non-null float64 21 generation fossil gas 178306 non-null float64 22 generation fossil hard coal 178306 non-null float64 23 generation fossil oil 178301 non-null float64 24 generation fossil oil shale 178306 non-null float64 25 generation fossil peat 178306 non-null float64 26 generation geothermal 178306 non-null float64 27 generation hydro pumped storage aggregated 0 non-null float64 28 generation hydro pumped storage consumption 178301 non-null float64 29 generation hydro run-of-river and poundage 178301 non-null float64 30 generation hydro water reservoir 178306 non-null float64 31 generation marine 178301 non-null float64 32 generation nuclear 178311 non-null float64 33 generation other 178306 non-null float64 34 generation other renewable 178306 non-null float64 35 generation solar 178306 non-null float64 36 generation waste 178301 non-null float64 37 generation wind offshore 178306 non-null float64 38 generation wind onshore 178306 non-null float64 39 forecast solar day ahead 178396 non-null float64 40 forecast wind offshore eday ahead 0 non-null float64 41 forecast wind onshore day ahead 178396 non-null float64 42 total load forecast 178396 non-null float64 43 total load actual 178216 non-null float64 44 price day ahead 178396 non-null float64 45 price actual 178396 non-null float64 46 energy_loss 178216 non-null float64 dtypes: float64(35), int64(6), object(6) memory usage: 65.3+ MB
#display a summary of information about the DataFrame's structure and contents.
weather_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 178396 entries, 0 to 178395 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 dt_iso 178396 non-null object 1 city_name 178396 non-null object 2 temp 178396 non-null float64 3 temp_min 178396 non-null float64 4 temp_max 178396 non-null float64 5 pressure 178396 non-null int64 6 humidity 178396 non-null int64 7 wind_speed 178396 non-null int64 8 wind_deg 178396 non-null int64 9 rain_1h 178396 non-null float64 10 rain_3h 178396 non-null float64 11 snow_3h 178396 non-null float64 12 clouds_all 178396 non-null int64 13 weather_id 178396 non-null int64 14 weather_main 178396 non-null object 15 weather_description 178396 non-null object 16 weather_icon 178396 non-null object dtypes: float64(6), int64(6), object(5) memory usage: 23.1+ MB
#count of non-null values in the corresponding column of the DataFrame.
energy_df.count()
time 35064 generation biomass 35045 generation fossil brown coal/lignite 35046 generation fossil coal-derived gas 35046 generation fossil gas 35046 generation fossil hard coal 35046 generation fossil oil 35045 generation fossil oil shale 35046 generation fossil peat 35046 generation geothermal 35046 generation hydro pumped storage aggregated 0 generation hydro pumped storage consumption 35045 generation hydro run-of-river and poundage 35045 generation hydro water reservoir 35046 generation marine 35045 generation nuclear 35047 generation other 35046 generation other renewable 35046 generation solar 35046 generation waste 35045 generation wind offshore 35046 generation wind onshore 35046 forecast solar day ahead 35064 forecast wind offshore eday ahead 0 forecast wind onshore day ahead 35064 total load forecast 35064 total load actual 35028 price day ahead 35064 price actual 35064 energy_loss 35028 dtype: int64
import plotly.express as px
time_line_dict= [{
"Task":"Defining objectives",
"Start":"2023-7-3",
"End":"2023-7-4"
},
{"Task":"Data Collection",
"Start":"2023-7-5",
"End":"2023-7-7"},
{"Task":"Data Exploration",
"Start":"2023-7-5",
"End":"2023-7-7"},
{"Task":"Feature Selection and Engineering",
"Start":"2023-7-11",
"End":"2023-7-15"},
{"Task":"Model Selection and Developement",
"Start":"2023-7-15",
"End":"2023-7-20"},{"Task":"Model Evaluation and Refinement",
"Start":"2023-7-20",
"End":"2023-7-27"},
{"Task":"Interpretation and presentation of results",
"Start":"2023-7-28",
"End":"2023-8-3"},
{"Task":"deployment and monitoring",
"Start":"2023-8-4",
"End":"2023-8-13"}
]
fig=px.timeline(pd.DataFrame(time_line_dict), x_start="Start", x_end="End", y="Task")
fig.update_yaxes(autorange="reversed") # otherwise tasks are listed from the bottom up
fig.show()
#drop the specified columns from the DataFrame "final"
final.drop(columns=["forecast wind offshore eday ahead","generation hydro pumped storage aggregated"],axis=1,inplace=True)
# List of columns to be summed from the DataFrame
#calculating the total energy generation across different sources.
columns_to_sum=['generation biomass', 'generation fossil brown coal/lignite',
'generation fossil coal-derived gas', 'generation fossil gas',
'generation fossil hard coal', 'generation fossil oil',
'generation fossil oil shale', 'generation fossil peat',
'generation geothermal',
'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation marine',
'generation nuclear', 'generation other', 'generation other renewable',
'generation solar', 'generation waste', 'generation wind offshore',
'generation wind onshore']
# Display data types of columns in the DataFrame "energy_df"
energy_df.dtypes
time object generation biomass float64 generation fossil brown coal/lignite float64 generation fossil coal-derived gas float64 generation fossil gas float64 generation fossil hard coal float64 generation fossil oil float64 generation fossil oil shale float64 generation fossil peat float64 generation geothermal float64 generation hydro pumped storage aggregated float64 generation hydro pumped storage consumption float64 generation hydro run-of-river and poundage float64 generation hydro water reservoir float64 generation marine float64 generation nuclear float64 generation other float64 generation other renewable float64 generation solar float64 generation waste float64 generation wind offshore float64 generation wind onshore float64 forecast solar day ahead float64 forecast wind offshore eday ahead float64 forecast wind onshore day ahead float64 total load forecast float64 total load actual float64 price day ahead float64 price actual float64 energy_loss float64 dtype: object
# Import the "pyplot" submodule from the "matplotlib" library, which provides plotting functions
import matplotlib.pyplot as plt
# the "plot()" function from the "pyplot" submodule to create a line plot
# The x-axis data is taken from the "time" column of the first 1000 rows of the DataFrame "energy_df"
# The y-axis data is taken from the "energy_loss" column of the same 1000 rows
plt.plot(energy_df[:1000]["time"],energy_df[:1000]["energy_loss"])
plt.show()
#uses the corr() method on the DataFrame 'final' to compute the correlation matrix.
final.corr().columns
<ipython-input-26-fce803a60abf>:3: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
Index(['temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed',
'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h', 'clouds_all', 'weather_id',
'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil coal-derived gas', 'generation fossil gas',
'generation fossil hard coal', 'generation fossil oil',
'generation fossil oil shale', 'generation fossil peat',
'generation geothermal', 'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation marine',
'generation nuclear', 'generation other', 'generation other renewable',
'generation solar', 'generation waste', 'generation wind offshore',
'generation wind onshore', 'forecast solar day ahead',
'forecast wind onshore day ahead', 'total load forecast',
'total load actual', 'price day ahead', 'price actual', 'energy_loss'],
dtype='object')
# Import the "seaborn" library for data visualization tools
import seaborn as sns
# The resulting heatmap provides a visual representation of the correlation relationships
sns.heatmap(final.corr());
<ipython-input-27-eb325e8a509d>:5: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
final.corr()
<ipython-input-28-4fee8eca12fa>:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
| temp | temp_min | temp_max | pressure | humidity | wind_speed | wind_deg | rain_1h | rain_3h | snow_3h | ... | generation waste | generation wind offshore | generation wind onshore | forecast solar day ahead | forecast wind onshore day ahead | total load forecast | total load actual | price day ahead | price actual | energy_loss | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| temp | 1.000000 | 0.974541 | 0.966853 | -0.008833 | -0.573542 | 0.115307 | -0.052199 | -0.066632 | -0.010022 | -0.039008 | ... | 0.078189 | NaN | -0.125695 | 0.383305 | -0.126883 | 0.179700 | 0.181200 | 0.061611 | 0.069932 | -0.007603 |
| temp_min | 0.974541 | 1.000000 | 0.892425 | -0.007505 | -0.569617 | 0.113380 | -0.041872 | -0.071634 | -0.003528 | -0.035890 | ... | 0.086142 | NaN | -0.118171 | 0.375994 | -0.119198 | 0.178020 | 0.179328 | 0.073644 | 0.080857 | -0.005731 |
| temp_max | 0.966853 | 0.892425 | 1.000000 | -0.009710 | -0.534234 | 0.101714 | -0.067548 | -0.061496 | -0.016446 | -0.040011 | ... | 0.065756 | NaN | -0.128128 | 0.367203 | -0.129384 | 0.168685 | 0.170405 | 0.042390 | 0.051325 | -0.010385 |
| pressure | -0.008833 | -0.007505 | -0.009710 | 1.000000 | -0.027458 | 0.001379 | 0.002265 | 0.039309 | -0.000465 | -0.000200 | ... | -0.012980 | NaN | 0.010336 | -0.003001 | 0.010100 | -0.000906 | -0.000990 | -0.009851 | -0.007214 | 0.000832 |
| humidity | -0.573542 | -0.569617 | -0.534234 | -0.027458 | 1.000000 | -0.250336 | -0.029316 | 0.134445 | 0.014036 | 0.023744 | ... | 0.002689 | NaN | -0.026042 | -0.390739 | -0.024132 | -0.245748 | -0.245296 | -0.025828 | -0.024741 | -0.014864 |
| wind_speed | 0.115307 | 0.113380 | 0.101714 | 0.001379 | -0.250336 | 1.000000 | 0.261888 | 0.052220 | -0.019366 | -0.006230 | ... | -0.048364 | NaN | 0.211037 | 0.137233 | 0.210601 | 0.125179 | 0.126286 | -0.079933 | -0.146129 | -0.006039 |
| wind_deg | -0.052199 | -0.041872 | -0.067548 | 0.002265 | -0.029316 | 0.261888 | 1.000000 | 0.039426 | 0.002445 | -0.014599 | ... | -0.049592 | NaN | 0.094539 | -0.051249 | 0.094577 | -0.039849 | -0.041705 | -0.078951 | -0.099958 | 0.015604 |
| rain_1h | -0.066632 | -0.071634 | -0.061496 | 0.039309 | 0.134445 | 0.052220 | 0.039426 | 1.000000 | -0.009862 | 0.040347 | ... | -0.075450 | NaN | 0.064244 | -0.013872 | 0.064044 | 0.011445 | 0.012259 | -0.035598 | -0.035814 | -0.009319 |
| rain_3h | -0.010022 | -0.003528 | -0.016446 | -0.000465 | 0.014036 | -0.019366 | 0.002445 | -0.009862 | 1.000000 | -0.001063 | ... | -0.043109 | NaN | 0.000168 | 0.002119 | 0.000262 | -0.002777 | -0.003210 | -0.014641 | -0.009344 | 0.003499 |
| snow_3h | -0.039008 | -0.035890 | -0.040011 | -0.000200 | 0.023744 | -0.006230 | -0.014599 | 0.040347 | -0.001063 | 1.000000 | ... | -0.033426 | NaN | -0.000810 | 0.008593 | -0.000858 | -0.004551 | -0.004486 | -0.002330 | 0.006581 | -0.001467 |
| clouds_all | -0.221331 | -0.208759 | -0.226416 | 0.004443 | 0.400483 | 0.051049 | 0.034008 | 0.229401 | 0.024327 | 0.044464 | ... | -0.036354 | NaN | 0.070695 | -0.044241 | 0.070136 | 0.012446 | 0.013725 | -0.016462 | -0.052895 | -0.012542 |
| weather_id | 0.157494 | 0.157292 | 0.149840 | -0.004053 | -0.290514 | -0.042262 | -0.030328 | -0.461414 | 0.020114 | -0.050192 | ... | 0.044629 | NaN | -0.075360 | 0.056795 | -0.075355 | -0.001231 | -0.002647 | 0.015878 | 0.031498 | 0.014769 |
| generation biomass | 0.035562 | 0.026501 | 0.037321 | 0.006566 | -0.022922 | -0.022654 | 0.016182 | 0.025411 | 0.038221 | 0.014048 | ... | -0.346049 | NaN | -0.071454 | -0.008640 | -0.075154 | 0.084213 | 0.082126 | 0.108610 | 0.139684 | 0.026574 |
| generation fossil brown coal/lignite | 0.060444 | 0.057921 | 0.059616 | -0.009407 | 0.009193 | -0.096152 | -0.072467 | -0.044544 | -0.002908 | -0.006839 | ... | 0.281694 | NaN | -0.433778 | 0.042036 | -0.435710 | 0.277840 | 0.279773 | 0.567427 | 0.362119 | -0.006434 |
| generation fossil coal-derived gas | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| generation fossil gas | 0.098760 | 0.103899 | 0.085671 | -0.007174 | -0.067029 | -0.058576 | -0.073656 | -0.035887 | -0.016535 | -0.009964 | ... | 0.273775 | NaN | -0.396052 | 0.079422 | -0.395979 | 0.544385 | 0.549438 | 0.641082 | 0.461567 | -0.029822 |
| generation fossil hard coal | 0.075921 | 0.070676 | 0.074213 | -0.009351 | -0.022854 | -0.088650 | -0.064558 | -0.026681 | 0.009503 | -0.002292 | ... | 0.169388 | NaN | -0.441006 | 0.046475 | -0.443574 | 0.394580 | 0.396735 | 0.671350 | 0.463768 | -0.005262 |
| generation fossil oil | 0.098213 | 0.095109 | 0.090992 | -0.002862 | -0.093322 | -0.010796 | -0.020351 | 0.004391 | 0.018493 | 0.001296 | ... | -0.175926 | NaN | -0.052026 | 0.096486 | -0.058492 | 0.498399 | 0.496656 | 0.291538 | 0.283570 | 0.045628 |
| generation fossil oil shale | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| generation fossil peat | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| generation geothermal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| generation hydro pumped storage consumption | -0.200733 | -0.200461 | -0.187952 | 0.008045 | 0.135608 | 0.029410 | 0.067193 | 0.007115 | -0.001016 | -0.006052 | ... | -0.187205 | NaN | 0.387939 | -0.222139 | 0.389391 | -0.559705 | -0.562719 | -0.600289 | -0.425665 | 0.008499 |
| generation hydro run-of-river and poundage | -0.094031 | -0.083967 | -0.097980 | 0.006282 | -0.015607 | 0.103535 | 0.053272 | 0.039117 | 0.003958 | 0.015146 | ... | -0.284832 | NaN | 0.223480 | 0.044964 | 0.226772 | 0.120677 | 0.118790 | -0.294699 | -0.136326 | 0.025318 |
| generation hydro water reservoir | -0.016122 | -0.022264 | -0.006460 | 0.009787 | -0.059711 | 0.070773 | 0.011599 | 0.036395 | 0.010565 | 0.009965 | ... | -0.287344 | NaN | -0.018325 | 0.102313 | -0.010482 | 0.476886 | 0.479831 | -0.017583 | 0.072349 | -0.009323 |
| generation marine | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| generation nuclear | -0.027666 | -0.035792 | -0.017446 | 0.009016 | 0.013120 | 0.013945 | 0.003441 | 0.021597 | -0.003753 | -0.005751 | ... | 0.087068 | NaN | 0.050326 | 0.000454 | 0.046849 | 0.090565 | 0.086333 | -0.044226 | -0.052200 | 0.051067 |
| generation other | -0.029561 | -0.035290 | -0.023901 | 0.010092 | 0.009448 | -0.010507 | 0.019226 | 0.025908 | 0.029510 | 0.007493 | ... | -0.360662 | NaN | 0.045954 | -0.019601 | 0.043062 | 0.101230 | 0.100589 | 0.044224 | 0.099534 | 0.013591 |
| generation other renewable | 0.000076 | 0.021759 | -0.021381 | -0.009222 | -0.013323 | -0.012568 | -0.046320 | -0.048473 | -0.047161 | -0.029162 | ... | 0.613788 | NaN | -0.135310 | 0.027202 | -0.136993 | 0.178783 | 0.182773 | 0.429029 | 0.257654 | -0.026187 |
| generation solar | 0.380767 | 0.373690 | 0.364353 | -0.003049 | -0.393232 | 0.136741 | -0.049131 | -0.014371 | 0.001884 | 0.010418 | ... | 0.000678 | NaN | -0.166908 | 0.993225 | -0.172551 | 0.397345 | 0.394375 | 0.057769 | 0.097720 | 0.047448 |
| generation waste | 0.078189 | 0.086142 | 0.065756 | -0.012980 | 0.002689 | -0.048364 | -0.049592 | -0.075450 | -0.043109 | -0.033426 | ... | 1.000000 | NaN | -0.179539 | 0.000844 | -0.183996 | 0.076476 | 0.078378 | 0.368187 | 0.170182 | -0.010609 |
| generation wind offshore | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| generation wind onshore | -0.125695 | -0.118171 | -0.128128 | 0.010336 | -0.026042 | 0.211037 | 0.094539 | 0.064244 | 0.000168 | -0.000810 | ... | -0.179539 | NaN | 1.000000 | -0.170121 | 0.994405 | 0.039633 | 0.042074 | -0.422481 | -0.218339 | -0.021957 |
| forecast solar day ahead | 0.383305 | 0.375994 | 0.367203 | -0.003001 | -0.390739 | 0.137233 | -0.051249 | -0.013872 | 0.002119 | 0.008593 | ... | 0.000844 | NaN | -0.170121 | 1.000000 | -0.174701 | 0.404597 | 0.402687 | 0.061788 | 0.100664 | 0.041717 |
| forecast wind onshore day ahead | -0.126883 | -0.119198 | -0.129384 | 0.010100 | -0.024132 | 0.210601 | 0.094577 | 0.064044 | 0.000262 | -0.000858 | ... | -0.183996 | NaN | 0.994405 | -0.174701 | 1.000000 | 0.037186 | 0.039649 | -0.426549 | -0.219190 | -0.020653 |
| total load forecast | 0.179700 | 0.178020 | 0.168685 | -0.000906 | -0.245748 | 0.125179 | -0.039849 | 0.011445 | -0.002777 | -0.004551 | ... | 0.076476 | NaN | 0.039633 | 0.404597 | 0.037186 | 1.000000 | 0.995150 | 0.475440 | 0.435944 | 0.088711 |
| total load actual | 0.181200 | 0.179328 | 0.170405 | -0.000990 | -0.245296 | 0.126286 | -0.041705 | 0.012259 | -0.003210 | -0.004486 | ... | 0.078378 | NaN | 0.042074 | 0.402687 | 0.039649 | 0.995150 | 1.000000 | 0.474668 | 0.436263 | -0.009703 |
| price day ahead | 0.061611 | 0.073644 | 0.042390 | -0.009851 | -0.025828 | -0.079933 | -0.078951 | -0.035598 | -0.014641 | -0.002330 | ... | 0.368187 | NaN | -0.422481 | 0.061788 | -0.426549 | 0.475440 | 0.474668 | 1.000000 | 0.730636 | 0.022470 |
| price actual | 0.069932 | 0.080857 | 0.051325 | -0.007214 | -0.024741 | -0.146129 | -0.099958 | -0.035814 | -0.009344 | 0.006581 | ... | 0.170182 | NaN | -0.218339 | 0.100664 | -0.219190 | 0.435944 | 0.436263 | 0.730636 | 1.000000 | 0.020216 |
| energy_loss | -0.007603 | -0.005731 | -0.010385 | 0.000832 | -0.014864 | -0.006039 | 0.015604 | -0.009319 | 0.003499 | -0.001467 | ... | -0.010609 | NaN | -0.021957 | 0.041717 | -0.020653 | 0.088711 | -0.009703 | 0.022470 | 0.020216 | 1.000000 |
39 rows × 39 columns
# Define a list of column names that have no significant correlation with other columns
columns_with_no_corr=["generation fossil oil shale","generation geothermal","generation wind offshore","generation marine","generation fossil coal-derived gas"]
final.drop(columns=columns_with_no_corr,axis=1,inplace=True)
final.drop(columns=["generation fossil peat"],axis=1,inplace=True)
import seaborn as sns
sns.heatmap(final.corr());
<ipython-input-30-cae2ab37ff4a>:6: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
#compute the correlation matrix for the DataFrame final and then retrieve the column names for which correlations were computed.
final.corr().columns
<ipython-input-31-71e34a2c95b7>:3: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
Index(['temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed',
'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h', 'clouds_all', 'weather_id',
'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil gas', 'generation fossil hard coal',
'generation fossil oil', 'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation nuclear',
'generation other', 'generation other renewable', 'generation solar',
'generation waste', 'generation wind onshore',
'forecast solar day ahead', 'forecast wind onshore day ahead',
'total load forecast', 'total load actual', 'price day ahead',
'price actual', 'energy_loss'],
dtype='object')
#final["total load forecast"].hist()
sns.histplot(x=final["total load forecast"])
<Axes: xlabel='total load forecast', ylabel='Count'>
# Import necessary libraries from scikit-learn
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn.preprocessing import MinMaxScaler
# Import necessary libraries
from datetime import datetime
# Convert the 'dt_iso' column to datetime format
final["dt_iso"]= final["dt_iso"].apply(lambda x : datetime.fromisoformat(x.replace("Z", "+00:00")))
from statsmodels.tsa.seasonal import seasonal_decompose
# Perform seasonal decomposition for 'total load forecast' in the city of Valencia using additive model
result_add = seasonal_decompose(final[final["city_name"]=="Valencia"]["total load forecast"], model='additive', extrapolate_trend='freq', period=8760)
# Plot
plt.rcParams.update({'figure.figsize': (10,20)})
result_add.plot().suptitle('Additive Decomposition', fontsize=22)
plt.show()
final
| dt_iso | city_name | temp | temp_min | temp_max | pressure | humidity | wind_speed | wind_deg | rain_1h | ... | generation solar | generation waste | generation wind onshore | forecast solar day ahead | forecast wind onshore day ahead | total load forecast | total load actual | price day ahead | price actual | energy_loss | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 00:00:00+01:00 | Valencia | 270.475000 | 270.475000 | 270.475000 | 1001 | 77 | 1 | 62 | 0.0 | ... | 49.0 | 196.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 | 733.0 |
| 1 | 2015-01-01 00:00:00+01:00 | Madrid | 267.325000 | 267.325000 | 267.325000 | 971 | 63 | 1 | 309 | 0.0 | ... | 49.0 | 196.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 | 733.0 |
| 2 | 2015-01-01 00:00:00+01:00 | Bilbao | 269.657312 | 269.657312 | 269.657312 | 1036 | 97 | 0 | 226 | 0.0 | ... | 49.0 | 196.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 | 733.0 |
| 3 | 2015-01-01 00:00:00+01:00 | Barcelona | 281.625000 | 281.625000 | 281.625000 | 1035 | 100 | 7 | 58 | 0.0 | ... | 49.0 | 196.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 | 733.0 |
| 4 | 2015-01-01 00:00:00+01:00 | Seville | 273.375000 | 273.375000 | 273.375000 | 1039 | 75 | 1 | 21 | 0.0 | ... | 49.0 | 196.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 | 733.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 178391 | 2018-12-31 23:00:00+01:00 | Valencia | 279.140000 | 278.150000 | 280.150000 | 1029 | 75 | 2 | 300 | 0.0 | ... | 31.0 | 287.0 | 3651.0 | 26.0 | 3117.0 | 24424.0 | 24455.0 | 64.27 | 69.88 | -31.0 |
| 178392 | 2018-12-31 23:00:00+01:00 | Madrid | 275.150000 | 275.150000 | 275.150000 | 1031 | 74 | 1 | 360 | 0.0 | ... | 31.0 | 287.0 | 3651.0 | 26.0 | 3117.0 | 24424.0 | 24455.0 | 64.27 | 69.88 | -31.0 |
| 178393 | 2018-12-31 23:00:00+01:00 | Bilbao | 275.600000 | 275.150000 | 276.150000 | 1034 | 93 | 2 | 100 | 0.0 | ... | 31.0 | 287.0 | 3651.0 | 26.0 | 3117.0 | 24424.0 | 24455.0 | 64.27 | 69.88 | -31.0 |
| 178394 | 2018-12-31 23:00:00+01:00 | Barcelona | 280.130000 | 277.150000 | 283.150000 | 1028 | 100 | 5 | 310 | 0.0 | ... | 31.0 | 287.0 | 3651.0 | 26.0 | 3117.0 | 24424.0 | 24455.0 | 64.27 | 69.88 | -31.0 |
| 178395 | 2018-12-31 23:00:00+01:00 | Seville | 283.970000 | 282.150000 | 285.150000 | 1029 | 70 | 3 | 50 | 0.0 | ... | 31.0 | 287.0 | 3651.0 | 26.0 | 3117.0 | 24424.0 | 24455.0 | 64.27 | 69.88 | -31.0 |
178396 rows × 39 columns
weather_df
| dt_iso | city_name | temp | temp_min | temp_max | pressure | humidity | wind_speed | wind_deg | rain_1h | rain_3h | snow_3h | clouds_all | weather_id | weather_main | weather_description | weather_icon | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 00:00:00+01:00 | Valencia | 270.475 | 270.475 | 270.475 | 1001 | 77 | 1 | 62 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 1 | 2015-01-01 01:00:00+01:00 | Valencia | 270.475 | 270.475 | 270.475 | 1001 | 77 | 1 | 62 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 2 | 2015-01-01 02:00:00+01:00 | Valencia | 269.686 | 269.686 | 269.686 | 1002 | 78 | 0 | 23 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 3 | 2015-01-01 03:00:00+01:00 | Valencia | 269.686 | 269.686 | 269.686 | 1002 | 78 | 0 | 23 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 4 | 2015-01-01 04:00:00+01:00 | Valencia | 269.686 | 269.686 | 269.686 | 1002 | 78 | 0 | 23 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 178391 | 2018-12-31 19:00:00+01:00 | Seville | 287.760 | 287.150 | 288.150 | 1028 | 54 | 3 | 30 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 178392 | 2018-12-31 20:00:00+01:00 | Seville | 285.760 | 285.150 | 286.150 | 1029 | 62 | 3 | 30 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 178393 | 2018-12-31 21:00:00+01:00 | Seville | 285.150 | 285.150 | 285.150 | 1028 | 58 | 4 | 50 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 178394 | 2018-12-31 22:00:00+01:00 | Seville | 284.150 | 284.150 | 284.150 | 1029 | 57 | 4 | 60 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
| 178395 | 2018-12-31 23:00:00+01:00 | Seville | 283.970 | 282.150 | 285.150 | 1029 | 70 | 3 | 50 | 0.0 | 0.0 | 0.0 | 0 | 800 | clear | sky is clear | 01n |
178396 rows × 17 columns
# Calculate the rolling mean of the 'total load forecast' for the city of Valencia
# The rolling window size is set to 30 days (720 hours) to compute a moving average
energy_mean = final[final["city_name"]=="Valencia"]["total load forecast"].rolling(window=30*24).mean()
# Plot the rolling mean of energy load forecast for visualization
# The 'figsize' parameter adjusts the size of the plot
energy_mean.plot(figsize=(20,15))
<Axes: >
# The 'energy_mean' DataFrame contains the rolling mean of the 'total load forecast' for the city of Valencia.
energy_mean
0 NaN
5 NaN
10 NaN
15 NaN
20 NaN
...
178371 28783.105556
178376 28777.430556
178381 28773.012500
178386 28770.676389
178391 28769.665278
Name: total load forecast, Length: 35145, dtype: float64
# Define a function to calculate the maximum of the last 'n' values in a series excluding the current value at each index
def max_of_last_n_excluding_self(series, n):
max_values = []
for i in range(len(series)):
if i >= n:
max_value = max(series[i - n : i]) # Exclude the current value at index i
max_values.append(max_value)
else:
max_values.append(None) # Not enough data for the first n indices
return max_values
# Create a DataFrame 'actual_df' containing the actual 'total load forecast' for the city of Valencia
actual_df = final[final["city_name"]=="Valencia"]["total load forecast"].to_frame().rename(columns = {"total load forecast": "total load forecast_actual" })
# Calculate the maximum of the last 24 hours' load forecasts (excluding the current hour) and add it as a new column
actual_df["total load forecast_pred"] = max_of_last_n_excluding_self(final[final["city_name"]=="Valencia"]["total load forecast"],24)
# Concatenate the actual and predicted temperature
# Select from the second row, because there is no prediction for today due to shifting.
actual_df.dropna(inplace=True)
actual_df
| total load forecast_actual | total load forecast_pred | |
|---|---|---|
| 120 | 27309.0 | 30739.0 |
| 125 | 25397.0 | 30739.0 |
| 130 | 23640.0 | 30739.0 |
| 135 | 22638.0 | 30739.0 |
| 140 | 22238.0 | 30739.0 |
| ... | ... | ... |
| 178371 | 30619.0 | 30378.0 |
| 178376 | 29932.0 | 30619.0 |
| 178381 | 27903.0 | 30619.0 |
| 178386 | 25450.0 | 30619.0 |
| 178391 | 24424.0 | 30619.0 |
35121 rows × 2 columns
# Create a DataFrame 'actual_df' containing the actual 'total load forecast' for the city of Valencia
actual_df = final[final["city_name"]=="Valencia"]["total load forecast"].to_frame().rename(columns = {"total load forecast": "total load forecast_actual" })
# Calculate the rolling mean of the 'total load forecast' over a window of 24 hours and add it as a new column
actual_df["total load forecast_pred"] = final[final["city_name"]=="Valencia"]["total load forecast"].rolling(window=24).mean()
# Drop rows with missing values (NaN) in the DataFrame
actual_df.dropna(inplace=True)
# Initialize a MinMaxScaler for feature scaling
scalar = MinMaxScaler()
# Apply MinMax scaling to the 'actual_df' DataFrame
scaled_onestep=scalar.fit_transform(actual_df)
actual_df
| total load forecast_actual | total load forecast_pred | |
|---|---|---|
| 115 | 27589.0 | 24703.625000 |
| 120 | 27309.0 | 24753.250000 |
| 125 | 25397.0 | 24772.541667 |
| 130 | 23640.0 | 24777.750000 |
| 135 | 22638.0 | 24777.583333 |
| ... | ... | ... |
| 178371 | 30619.0 | 26266.833333 |
| 178376 | 29932.0 | 26258.166667 |
| 178381 | 27903.0 | 26155.041667 |
| 178386 | 25450.0 | 25989.708333 |
| 178391 | 24424.0 | 25877.458333 |
35122 rows × 2 columns
scaled_onestep
array([[0.40730084, 0.21785244],
[0.39527593, 0.22127015],
[0.31316298, 0.22259878],
...,
[0.42078591, 0.31781255],
[0.31543912, 0.30642593],
[0.27137642, 0.29869519]])
# Import necessary libraries
from sklearn.metrics import mean_squared_error as MSE
from math import sqrt
# Calculate the Root Mean Squared Error (RMSE) between two sets of scaled values
# The first column contains the actual values, and the second column contains the predicted values
temp_pred_err = MSE(scaled_onestep[:,0],scaled_onestep[:,1],squared=True)
# Print the calculated RMSE
print("The RMSE is",temp_pred_err)
The RMSE is 0.0386542457438511
scaled_onestep
array([[0.40730084, 0.21785244],
[0.39527593, 0.22127015],
[0.31316298, 0.22259878],
...,
[0.42078591, 0.31781255],
[0.31543912, 0.30642593],
[0.27137642, 0.29869519]])
# Create a DataFrame 'energy_series' to store scaled energy predictions and actual loads
energy_series=pd.DataFrame(scaled_onestep[:,1],columns=["Energy_predictions"])
energy_series["actual_load"]=scaled_onestep[:,0]
# Calculate the rolling mean of the energy predictions and actual loads over a window of 30 days (720 hours)
energy_mean=energy_series.rolling(window=30*24).mean()
#plot the mean of energy
energy_mean.plot(figsize=(20,15))
<Axes: >
scaled_onestep[:,0]
array([0.40730084, 0.39527593, 0.31316298, ..., 0.42078591, 0.31543912,
0.27137642])
scaled_onestep
array([[0.40730084, 0.21785244],
[0.39527593, 0.22127015],
[0.31316298, 0.22259878],
...,
[0.42078591, 0.31781255],
[0.31543912, 0.30642593],
[0.27137642, 0.29869519]])
import itertools
# Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 2)
# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
final.dtypes
dt_iso object city_name object temp float64 temp_min float64 temp_max float64 pressure int64 humidity int64 wind_speed int64 wind_deg int64 rain_1h float64 rain_3h float64 snow_3h float64 clouds_all int64 weather_id int64 weather_main object weather_description object weather_icon object time object generation biomass float64 generation fossil brown coal/lignite float64 generation fossil gas float64 generation fossil hard coal float64 generation fossil oil float64 generation hydro pumped storage consumption float64 generation hydro run-of-river and poundage float64 generation hydro water reservoir float64 generation nuclear float64 generation other float64 generation other renewable float64 generation solar float64 generation waste float64 generation wind onshore float64 forecast solar day ahead float64 forecast wind onshore day ahead float64 total load forecast float64 total load actual float64 price day ahead float64 price actual float64 energy_loss float64 dtype: object
# Checking linear Regression
final_x=final[final["city_name"]=="Valencia"][['temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed',
'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h', 'clouds_all', 'weather_id',
'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil gas', 'generation fossil hard coal',
'generation fossil oil', 'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation nuclear',
'generation other', 'generation other renewable', 'generation solar',
'generation waste', 'generation wind onshore',
'forecast solar day ahead', 'forecast wind onshore day ahead',
'total load forecast']].dropna()
scaler = MinMaxScaler()
# Fit the scaler on the data and transform it
X = final_x[['temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed',
'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h', 'clouds_all', 'weather_id',
'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil gas', 'generation fossil hard coal',
'generation fossil oil', 'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation nuclear',
'generation other', 'generation other renewable', 'generation solar',
'generation waste', 'generation wind onshore',
'forecast solar day ahead', 'forecast wind onshore day ahead']]
y = final_x[["total load forecast"]]
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test
X_train=scaler.fit_transform(X_train)
X_test=scaler.fit_transform(X_test)
y_train=scaler.fit_transform(y_train)
y_test=scaler.fit_transform(y_test)
# Create Decision Tree Classifier object
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
# Create a Decision Tree Regressor object with a specified maximum depth
clf = DecisionTreeRegressor(max_depth=8)
# Train the Decision Tree Regressor on the training dataset
clf = clf.fit(X_train,y_train)
# Predict the response for test dataset
y_pred = clf.predict(X_test)
# Create a figure for plotting the decision tree
fig=plt.figure(figsize=(45,10))
# Plot the decision tree with filled nodes
plot_tree(clf,filled=True,feature_names=['temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed',
'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h', 'clouds_all', 'weather_id',
'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil gas', 'generation fossil hard coal',
'generation fossil oil', 'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation nuclear',
'generation other', 'generation other renewable', 'generation solar',
'generation waste', 'generation wind onshore',
'forecast solar day ahead', 'forecast wind onshore day ahead'],fontsize=20)
# Display the decision tree plot
plt.show()
# Import necessary libraries
from sklearn.metrics import mean_squared_error
# Calculate the Root Mean Squared Error (RMSE) between the true 'y_test' values and the predicted 'y_pred' values
# The 'squared=False' argument ensures that the RMSE is returned in its original scale
# Print the calculated RMSE for model evaluation
print("Model1 RMSE "+str(mean_squared_error(y_test,y_pred,squared=False)))
Model1 RMSE 0.11492390174981869
# Import necessary libraries
from sklearn.metrics import mean_absolute_error
# Print the calculated MAE for model evaluation
print("Model1 MAE "+str(mean_absolute_error(y_test,y_pred)))
Model1 MAE 0.088717941065767
# Import necessary libraries
import statsmodels.api as sm
# Add a constant column to the feature matrix 'X_train'
x = sm.add_constant(X_train)
# Fit an Ordinary Least Squares (OLS) regression model using 'x' as the feature matrix and 'y_train' as the target variable
model = sm.OLS(y_train, X_train).fit()
# Print the summary of the fitted OLS model, including statistical information and model performance metrics
print(model.summary())
OLS Regression Results
=======================================================================================
Dep. Variable: y R-squared (uncentered): 0.980
Model: OLS Adj. R-squared (uncentered): 0.980
Method: Least Squares F-statistic: 4.295e+04
Date: Tue, 15 Aug 2023 Prob (F-statistic): 0.00
Time: 02:12:49 Log-Likelihood: 30437.
No. Observations: 24585 AIC: -6.082e+04
Df Residuals: 24557 BIC: -6.059e+04
Df Model: 28
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 -0.0314 0.071 -0.440 0.660 -0.171 0.108
x2 0.0880 0.037 2.396 0.017 0.016 0.160
x3 -0.1169 0.040 -2.913 0.004 -0.196 -0.038
x4 0.0256 0.005 5.548 0.000 0.017 0.035
x5 -0.0772 0.003 -27.600 0.000 -0.083 -0.072
x6 -0.0345 0.014 -2.531 0.011 -0.061 -0.008
x7 -0.0226 0.001 -15.496 0.000 -0.025 -0.020
x8 -0.3243 0.025 -13.175 0.000 -0.373 -0.276
x9 -0.0208 0.013 -1.592 0.111 -0.047 0.005
x10 -0.2245 0.037 -6.059 0.000 -0.297 -0.152
x11 -0.0080 0.002 -4.003 0.000 -0.012 -0.004
x12 -0.1311 0.004 -34.079 0.000 -0.139 -0.124
x13 -0.1365 0.005 -27.899 0.000 -0.146 -0.127
x14 0.0336 0.002 15.619 0.000 0.029 0.038
x15 0.6034 0.006 102.250 0.000 0.592 0.615
x16 0.2417 0.004 64.225 0.000 0.234 0.249
x17 0.1260 0.005 24.176 0.000 0.116 0.136
x18 -0.3281 0.003 -100.019 0.000 -0.335 -0.322
x19 -0.0123 0.004 -3.266 0.001 -0.020 -0.005
x20 0.4187 0.004 110.561 0.000 0.411 0.426
x21 0.1156 0.004 31.148 0.000 0.108 0.123
x22 0.0063 0.003 1.933 0.053 -8.92e-05 0.013
x23 -0.0617 0.006 -10.636 0.000 -0.073 -0.050
x24 0.0508 0.013 3.772 0.000 0.024 0.077
x25 0.0253 0.005 5.369 0.000 0.016 0.035
x26 0.3834 0.023 16.423 0.000 0.338 0.429
x27 0.1576 0.014 11.593 0.000 0.131 0.184
x28 0.1182 0.023 5.084 0.000 0.073 0.164
==============================================================================
Omnibus: 322.452 Durbin-Watson: 1.980
Prob(Omnibus): 0.000 Jarque-Bera (JB): 591.712
Skew: 0.027 Prob(JB): 3.25e-129
Kurtosis: 3.758 Cond. No. 514.
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
# Add a constant column to the feature matrix 'X_test'
x = sm.add_constant(X_test)
# Use the fitted OLS model 'model' to predict the target variable for the test dataset
predicted_list=model.predict(X_test)
# Display the predicted values for the test dataset
predicted_list
array([0.5302908 , 0.60057654, 0.64597567, ..., 0.5666231 , 0.46907775,
0.22274939])
from sklearn.metrics import mean_squared_error
# Calculate the Root Mean Squared Error (RMSE) between the true 'y_test' values and the predicted 'predicted_list' values
# The 'squared=False' argument ensures that the RMSE is returned in its original scale
# Print the calculated RMSE for model evaluation
print("Model1 RMSE "+str(mean_squared_error(y_test,predicted_list,squared=False)))
Model1 RMSE 0.08678651264112534
# Import necessary libraries
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
# Create a Lasso regression model with a specified regularization parameter (alpha)
reg = linear_model.Lasso(alpha=0.1)
# Fit the Lasso regression model using the training data ('X_train' and 'y_train')
reg.fit(X_train,y_train)
# Calculate and print the R-squared score of the Lasso model on the training data
print(reg.score(X_train,y_train))
0.0
# Import necessary libraries
from sklearn.metrics import mean_squared_error
# Use the trained Lasso model 'reg' to predict the target variable for the test dataset
predicted_list=reg.predict(X_test)
# Print the calculated RMSE for model evaluation
print("Model1 RMSE "+str(mean_squared_error(y_test,predicted_list,squared=False)))
Model1 RMSE 0.19958037729911932
from sklearn.decomposition import PCA
import numpy as np # Import NumPy
# Fit PCA and calculate explained variance
pca = PCA()
pca.fit(X_train) # X is your data matrix
explained_variance_ratio = pca.explained_variance_ratio_
# Plot cumulative explained variance
import matplotlib.pyplot as plt
plt.figure(figsize=(8,6))
plt.plot(np.cumsum(explained_variance_ratio))
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance')
plt.show()
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Step 1: Normalize the data using StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_train)
X_test_scaled=scaler.fit_transform(X_test)
# Step 2: Perform PCA
pca = PCA(n_components=15) # Create a PCA object with 15 components
X_pca_train = pca.fit_transform(X_scaled) # Apply PCA to the normalized training data
X_pca_test = pca.fit_transform(X_test_scaled) # Apply PCA to the normalized test data
#for 15 components 95% variance captured
X_pca_train
array([[ 0.35664746, -1.62628323, -1.3136558 , ..., -1.34318139,
0.84314037, 0.74094237],
[-0.17754903, -1.99563622, 2.83850355, ..., 1.19729734,
0.56352396, 0.1770781 ],
[ 1.96399645, 4.17669541, -1.72121425, ..., -0.32360681,
0.76786134, -0.60591091],
...,
[-1.81625605, 3.98642503, 3.25749881, ..., -0.09425742,
-0.65614435, 0.17474444],
[ 0.49136231, 5.29209063, 0.17174245, ..., -0.66332313,
0.83978913, -1.22883956],
[ 1.35514104, 0.91199773, -0.55144859, ..., -0.99966362,
-2.04312735, 0.31304693]])
X_pca_test
array([[-0.67809883, 3.97624719, -2.06335347, ..., 0.47986341,
0.05772155, 0.18039093],
[-1.95231258, 0.43662293, 0.26922138, ..., 0.36639487,
0.12314122, -0.28634634],
[-1.15259208, -1.88037732, -0.62508409, ..., -1.26402646,
-1.40102374, -0.76403093],
...,
[ 1.64161227, 4.07773423, 2.79488827, ..., -0.94587568,
-0.91457452, -0.8681216 ],
[-0.32587695, 0.22216312, -2.23782486, ..., 1.21063252,
0.83198182, 0.34105151],
[ 3.81844739, -1.41876669, 0.51154829, ..., 0.4667914 ,
0.21061727, -1.32840204]])
# Import necessary libraries
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
# Create a RandomForestRegressor object with a specified maximum depth and random seed
regr = RandomForestRegressor(max_depth=2, random_state=0)
# Fit the RandomForestRegressor model using the transformed training data 'X_pca_train' and target values 'y_train'
regr.fit(X_pca_train, y_train)
RandomForestRegressor(max_depth=2, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(max_depth=2, random_state=0)
# Import necessary libraries
from sklearn.model_selection import GridSearchCV, KFold
# Define a parameter grid for hyperparameter tuning
param_grid = {
'ccp_alpha': [0.1, 1, 10],
"max_depth":[2,4,6]
}
# Create a KFold cross-validation object with 5 splits
kf = KFold(n_splits=5, shuffle=True, random_state=41)
# Create a GridSearchCV object to perform hyperparameter tuning
grid_search = GridSearchCV(estimator=regr, param_grid=param_grid, cv=kf, scoring='neg_mean_squared_error')
# Fit the GridSearchCV on the transformed training data and target values
grid_search.fit(X_pca_train, y_train.reshape(-1))
# Retrieve the best parameters found during the grid search
best_params = grid_search.best_params_
# Evaluate the best model on the test set
best_model = grid_search.best_estimator_
test_accuracy = best_model.score(X_pca_test, y_test.reshape(-1))
# Print the results of hyperparameter tuning and evaluation
print("Best Parameters:", best_params)
print("Test Accuracy:", test_accuracy)
Best Parameters: {'ccp_alpha': 0.1, 'max_depth': 2}
Test Accuracy: -0.0009079569577314928
import sklearn.metrics as met
# Retrieve the keys (names) of all available scoring metrics
met.SCORERS.keys()
dict_keys(['explained_variance', 'r2', 'max_error', 'matthews_corrcoef', 'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_root_mean_squared_error', 'neg_mean_poisson_deviance', 'neg_mean_gamma_deviance', 'accuracy', 'top_k_accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', 'roc_auc_ovo_weighted', 'balanced_accuracy', 'average_precision', 'neg_log_loss', 'neg_brier_score', 'positive_likelihood_ratio', 'neg_negative_likelihood_ratio', 'adjusted_rand_score', 'rand_score', 'homogeneity_score', 'completeness_score', 'v_measure_score', 'mutual_info_score', 'adjusted_mutual_info_score', 'normalized_mutual_info_score', 'fowlkes_mallows_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted'])
y_train
array([[0.23585393],
[0.40285074],
[0.29317027],
...,
[0.49797606],
[0.35685987],
[0.4940143 ]])
# Retrieve the keys (names) of the hyperparameters of the RandomForestRegressor
regr.get_params().keys()
dict_keys(['bootstrap', 'ccp_alpha', 'criterion', 'max_depth', 'max_features', 'max_leaf_nodes', 'max_samples', 'min_impurity_decrease', 'min_samples_leaf', 'min_samples_split', 'min_weight_fraction_leaf', 'n_estimators', 'n_jobs', 'oob_score', 'random_state', 'verbose', 'warm_start'])
# Create subplots with 1 row and 6 columns
fig, axes = plt.subplots(nrows = 1,ncols = 6,figsize = (10,2), dpi=900)
# Iterate through the first 6 estimators in the RandomForestRegressor
for index in range(0, 6):
# Plot the decision tree for the current estimator
tree.plot_tree(regr.estimators_[index],
feature_names=['temp', 'temp_min', 'temp_max', 'pressure', 'humidity', 'wind_speed',
'wind_deg', 'rain_1h', 'rain_3h', 'snow_3h', 'clouds_all', 'weather_id',
'generation biomass', 'generation fossil brown coal/lignite',
'generation fossil gas', 'generation fossil hard coal',
'generation fossil oil', 'generation hydro pumped storage consumption',
'generation hydro run-of-river and poundage',
'generation hydro water reservoir', 'generation nuclear',
'generation other', 'generation other renewable', 'generation solar',
'generation waste', 'generation wind onshore',
'forecast solar day ahead', 'forecast wind onshore day ahead'],
filled = True,
ax = axes[index]);
# Set the title for the current subplot
axes[index].set_title('Estimator: ' + str(index), fontsize = 11)
# Save the figure as an image file
fig.savefig('rf_5trees.png')
X_pca_test
array([[-0.67809883, 3.97624719, -2.06335347, ..., 0.47986341,
0.05772155, 0.18039093],
[-1.95231258, 0.43662293, 0.26922138, ..., 0.36639487,
0.12314122, -0.28634634],
[-1.15259208, -1.88037732, -0.62508409, ..., -1.26402646,
-1.40102374, -0.76403093],
...,
[ 1.64161227, 4.07773423, 2.79488827, ..., -0.94587568,
-0.91457452, -0.8681216 ],
[-0.32587695, 0.22216312, -2.23782486, ..., 1.21063252,
0.83198182, 0.34105151],
[ 3.81844739, -1.41876669, 0.51154829, ..., 0.4667914 ,
0.21061727, -1.32840204]])
from sklearn.metrics import mean_squared_error
y_pred_RF=regr.predict(X_pca_test)
print("Model1 MSE "+str(mean_squared_error(y_test,y_pred_RF)))
Model1 MSE 0.026724733755228974
from sklearn.metrics import mean_absolute_error
# Predict using the RandomForestRegressor
y_pred_RF=regr.predict(X_pca_test)
# Calculate and print the Mean Absolute Error (MAE)
print("Model1 MAE "+str(mean_absolute_error(y_test,y_pred_RF)))
Model1 MAE 0.13437720295643518
from sklearn.metrics import mean_absolute_percentage_error
# Predict using the RandomForestRegressor
y_pred_RF=regr.predict(X_pca_test)
# Calculate and print the Mean Absolute Percentage Error (MAPE)
print("Model1 MAPE "+str(mean_absolute_percentage_error(y_test,y_pred_RF)))
Model1 MAPE 134949500586.28116
# Predict using the RandomForestRegressor on the training data
y_pred_RF_train=regr.predict(X_pca_train)
# Create a DataFrame to store the predicted energy values
energy_series=pd.DataFrame(y_pred_RF,columns=["Energy_predictions"])
# Add a column to the energy_series DataFrame for actual energy values
energy_series["energy_actual"]=y_test
# Calculate the rolling mean of the predicted energy values
energy_mean = energy_series.rolling(window=30*24).mean()
# Create a plot of the rolling mean of predicted energy values
energy_mean.plot(figsize=(20,15))
# random forest
<Axes: >
X_test_reshaped=X_pca_test.reshape(10537, 1, 15)
X_train.shape
(24585, 28)
X_train_reshaped = X_pca_train.reshape(24585, 1, 15)
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Build an RNN model
model = Sequential()
model.add(SimpleRNN(units=32, activation='relu', input_shape=(X_train_reshaped.shape[1], X_train_reshaped.shape[2])))
model.add(Dense(units=1))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
model.fit(X_train_reshaped, y_train, epochs=10, batch_size=32, validation_split=0.2)
# Evaluate the model
y_pred_RNN = model.predict(X_test_reshaped)
mse = mean_squared_error(y_test, y_pred_RNN)
print("Mean Squared Error:", mse)
Epoch 1/10 615/615 [==============================] - 3s 3ms/step - loss: 0.0805 - val_loss: 0.0230 Epoch 2/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0174 - val_loss: 0.0140 Epoch 3/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0121 - val_loss: 0.0108 Epoch 4/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0098 - val_loss: 0.0089 Epoch 5/10 615/615 [==============================] - 3s 4ms/step - loss: 0.0084 - val_loss: 0.0079 Epoch 6/10 615/615 [==============================] - 2s 4ms/step - loss: 0.0073 - val_loss: 0.0070 Epoch 7/10 615/615 [==============================] - 2s 4ms/step - loss: 0.0073 - val_loss: 0.0106 Epoch 8/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0116 - val_loss: 0.0253 Epoch 9/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0120 - val_loss: 0.0058 Epoch 10/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0058 - val_loss: 0.0058 330/330 [==============================] - 1s 3ms/step Mean Squared Error: 0.013080175496853379
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn (SimpleRNN) (None, 32) 1536
dense (Dense) (None, 1) 33
=================================================================
Total params: 1,569
Trainable params: 1,569
Non-trainable params: 0
_________________________________________________________________
# Predict using the RNN model on the reshaped test data
y_pred_RNN = model.predict(X_test_reshaped)
# Calculate the Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred_RNN)
# Print the calculated Mean Squared Error
print("Mean Squared Error:", mse)
330/330 [==============================] - 1s 3ms/step Mean Squared Error: 0.013080175496853379
# Predict using the RNN model on the reshaped test data
from sklearn.metrics import mean_absolute_error
# Calculate the Mean Absolute Error (MAE)
y_pred_RNN =model.predict(X_test_reshaped)
# Print the calculated Mean Absolute Error
print("Model1 MSE "+str(mean_absolute_error(y_test,y_pred_RNN )))
330/330 [==============================] - 1s 2ms/step Model1 MSE 0.07403818270914564
# Predict using the RNN model on the reshaped test data
from sklearn.metrics import mean_absolute_percentage_error
# Calculate the Mean Absolute Percentage Error (MAPE)
y_pred_RNN =model.predict(X_test_reshaped)
# Print the calculated Mean Absolute Percentage Error
print("Model1 MAPE "+str(mean_absolute_percentage_error(y_test,y_pred_RNN)))
330/330 [==============================] - 1s 2ms/step Model1 MAPE 58265804734.14871
y_pred_RNN_train=model.predict(X_train_reshaped)
769/769 [==============================] - 1s 2ms/step
# Create a DataFrame for storing RNN predictions
energy_series=pd.DataFrame(y_pred_RNN,columns=["Energy_predictions"])
# Add actual energy values to the DataFrame
energy_series["energy_actual"]=y_test
# Calculate rolling mean of energy predictions
energy_mean = energy_series.rolling(window=30*24).mean()
# Create a line plot of the rolling mean of energy predictions
energy_mean.plot(figsize=(20,15))
# RNN
<Axes: >
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
# Create a sequential model
model = Sequential()
# Add an LSTM layer
model.add(LSTM(units=64, return_sequences=True, input_shape=(X_train_reshaped.shape[1], X_train_reshaped.shape[2])))
model.add(Dense(units=1))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
model.fit(X_train_reshaped, y_train, epochs=10, batch_size=32, validation_split=0.2)
# Evaluate the model
y_pred_LSTM = model.predict(X_test_reshaped)
mse = mean_squared_error(y_test, y_pred_LSTM.reshape(10537))
print("Mean Squared Error:", mse)
Epoch 1/10 615/615 [==============================] - 6s 5ms/step - loss: 0.0164 - val_loss: 0.0064 Epoch 2/10 615/615 [==============================] - 3s 4ms/step - loss: 0.0059 - val_loss: 0.0056 Epoch 3/10 615/615 [==============================] - 3s 4ms/step - loss: 0.0054 - val_loss: 0.0053 Epoch 4/10 615/615 [==============================] - 4s 6ms/step - loss: 0.0052 - val_loss: 0.0050 Epoch 5/10 615/615 [==============================] - 3s 4ms/step - loss: 0.0049 - val_loss: 0.0048 Epoch 6/10 615/615 [==============================] - 3s 4ms/step - loss: 0.0047 - val_loss: 0.0047 Epoch 7/10 615/615 [==============================] - 2s 4ms/step - loss: 0.0046 - val_loss: 0.0048 Epoch 8/10 615/615 [==============================] - 3s 5ms/step - loss: 0.0045 - val_loss: 0.0044 Epoch 9/10 615/615 [==============================] - 3s 5ms/step - loss: 0.0044 - val_loss: 0.0045 Epoch 10/10 615/615 [==============================] - 3s 4ms/step - loss: 0.0043 - val_loss: 0.0045 330/330 [==============================] - 1s 2ms/step Mean Squared Error: 0.011152442743992654
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 1, 64) 20480
dense_1 (Dense) (None, 1, 1) 65
=================================================================
Total params: 20,545
Trainable params: 20,545
Non-trainable params: 0
_________________________________________________________________
y_pred_LSTM_train=model.predict(X_train_reshaped)
769/769 [==============================] - 2s 2ms/step
# Create a DataFrame for storing LSTM predictions
energy_series=pd.DataFrame(y_pred_LSTM.reshape(10537),columns=["Energy_predictions"])
# Add actual energy values to the DataFrame
energy_series["energy_actual"]=y_test
# Calculate rolling mean of energy predictions
energy_mean = energy_series.rolling(window=30*24).mean()
# Create a line plot of the rolling mean of energy predictions
energy_mean.plot(figsize=(10,6))
# LSTM
<Axes: >
# Create a sequential model
model = Sequential()
# Add input layer with 64 units and ReLU activation function
# Add hidden layer with 32 units and ReLU activation function
model.add(Dense(units=64, activation='relu', input_shape=(X_train_reshaped.shape[1], X_train_reshaped.shape[2])))
model.add(Dense(units=32, activation='relu'))
# Add output layer with 1 unit (for regression)
model.add(Dense(units=1))
# Compile the model with Adam optimizer and mean squared error loss
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model with training data for 10 epochs and batch size of 32
model.fit(X_train_reshaped, y_train, epochs=10, batch_size=32, validation_split=0.2)
# Predict energy consumption using the trained model
y_pred_MLP = model.predict(X_test_reshaped)
# Calculate mean squared error
mse = mean_squared_error(y_test, y_pred_MLP.reshape(10537))
# Print the mean squared error
print("Mean Squared Error:", mse)
Epoch 1/10 615/615 [==============================] - 3s 3ms/step - loss: 0.0427 - val_loss: 0.0155 Epoch 2/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0106 - val_loss: 0.0076 Epoch 3/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0068 - val_loss: 0.0063 Epoch 4/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0059 - val_loss: 0.0067 Epoch 5/10 615/615 [==============================] - 2s 4ms/step - loss: 0.0077 - val_loss: 0.0497 Epoch 6/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0163 - val_loss: 0.0060 Epoch 7/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0050 - val_loss: 0.0051 Epoch 8/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0047 - val_loss: 0.0051 Epoch 9/10 615/615 [==============================] - 1s 2ms/step - loss: 0.0046 - val_loss: 0.0046 Epoch 10/10 615/615 [==============================] - 2s 3ms/step - loss: 0.0046 - val_loss: 0.0047 330/330 [==============================] - 1s 2ms/step Mean Squared Error: 0.006490054271212503
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 1, 64) 1024
dense_3 (Dense) (None, 1, 32) 2080
dense_4 (Dense) (None, 1, 1) 33
=================================================================
Total params: 3,137
Trainable params: 3,137
Non-trainable params: 0
_________________________________________________________________
# Predict energy consumption using the trained MLP model on the training data
y_pred_MLP_train=model.predict(X_train_reshaped)
769/769 [==============================] - 1s 1ms/step
(X_train_reshaped.shape[1], X_train_reshaped.shape[2])
(1, 15)
#Create a DataFrame named energy_series with predicted energy values reshaped from y_pred_MLP
# Reshaping is done to make sure it matches the shape of y_test
energy_series=pd.DataFrame(y_pred_MLP.reshape(10537),columns=["Energy_predictions"])
# Add a new column to energy_series containing actual energy values from y_test
energy_series["energy_actual"]=y_test
# Calculate the rolling mean of the energy_series over a window of 30 days (30*24 hours)
energy_mean = energy_series.rolling(window=30*24).mean()
# Set the size of the plot to 20x15 inches
# Then, plot the rolling mean of energy_series using a line plot
energy_mean.plot(figsize=(20,15))
# MLP regression
<Axes: >
predictions_df
| Random Forest | RNN | MLP | LSTM | CNN_LSTM | Actual load | |
|---|---|---|---|---|---|---|
| 0 | 0.507204 | 0.212257 | 0.230389 | 0.188384 | 0.204368 | 0.235854 |
| 1 | 0.315739 | 0.418718 | 0.325366 | 0.316008 | 0.366726 | 0.402851 |
| 2 | 0.418931 | 0.287089 | 0.348004 | 0.308110 | 0.361589 | 0.293170 |
| 3 | 0.487849 | 0.605787 | 0.598816 | 0.562299 | 0.595291 | 0.615580 |
| 4 | 0.418931 | 0.271261 | 0.294836 | 0.285386 | 0.282516 | 0.375894 |
| ... | ... | ... | ... | ... | ... | ... |
| 24580 | 0.317454 | 0.364139 | 0.298728 | 0.282859 | 0.369462 | 0.376023 |
| 24581 | 0.502812 | 0.562295 | 0.534996 | 0.509357 | 0.542074 | 0.517699 |
| 24582 | 0.460907 | 0.468982 | 0.495694 | 0.556511 | 0.589115 | 0.497976 |
| 24583 | 0.418931 | 0.410248 | 0.322879 | 0.331517 | 0.393089 | 0.356860 |
| 24584 | 0.492767 | 0.459145 | 0.454469 | 0.485176 | 0.521329 | 0.494014 |
24585 rows × 6 columns
# Import the statsmodels library with an alias 'sm'
import statsmodels.api as sm
# Create a new DataFrame named 'x' by adding a constant column to the predictions_df
# The constant column is necessary for the intercept term in the linear regression model
x = sm.add_constant(predictions_df[["RNN","MLP","LSTM"]])
# Create a linear regression model using Ordinary Least Squares (OLS)
model = sm.OLS(predictions_df[["Actual load"]],predictions_df[["RNN","MLP","LSTM"]]).fit()
# Print a summary of the linear regression model's statistics
print(model.summary())
OLS Regression Results
=======================================================================================
Dep. Variable: Actual load R-squared (uncentered): 0.984
Model: OLS Adj. R-squared (uncentered): 0.984
Method: Least Squares F-statistic: 5.011e+05
Date: Tue, 15 Aug 2023 Prob (F-statistic): 0.00
Time: 02:46:07 Log-Likelihood: 33118.
No. Observations: 24585 AIC: -6.623e+04
Df Residuals: 24582 BIC: -6.621e+04
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
RNN 0.0743 0.009 8.362 0.000 0.057 0.092
MLP 0.3641 0.012 29.843 0.000 0.340 0.388
LSTM 0.5774 0.013 42.807 0.000 0.551 0.604
==============================================================================
Omnibus: 406.959 Durbin-Watson: 1.989
Prob(Omnibus): 0.000 Jarque-Bera (JB): 703.738
Skew: 0.131 Prob(JB): 1.53e-153
Kurtosis: 3.786 Cond. No. 36.1
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
predictions_df
| Random Forest | RNN | MLP | LSTM | CNN_LSTM | Actual load | |
|---|---|---|---|---|---|---|
| 0 | 0.507204 | 0.212257 | 0.230389 | 0.188384 | 0.204368 | 0.235854 |
| 1 | 0.315739 | 0.418718 | 0.325366 | 0.316008 | 0.366726 | 0.402851 |
| 2 | 0.418931 | 0.287089 | 0.348004 | 0.308110 | 0.361589 | 0.293170 |
| 3 | 0.487849 | 0.605787 | 0.598816 | 0.562299 | 0.595291 | 0.615580 |
| 4 | 0.418931 | 0.271261 | 0.294836 | 0.285386 | 0.282516 | 0.375894 |
| ... | ... | ... | ... | ... | ... | ... |
| 24580 | 0.317454 | 0.364139 | 0.298728 | 0.282859 | 0.369462 | 0.376023 |
| 24581 | 0.502812 | 0.562295 | 0.534996 | 0.509357 | 0.542074 | 0.517699 |
| 24582 | 0.460907 | 0.468982 | 0.495694 | 0.556511 | 0.589115 | 0.497976 |
| 24583 | 0.418931 | 0.410248 | 0.322879 | 0.331517 | 0.393089 | 0.356860 |
| 24584 | 0.492767 | 0.459145 | 0.454469 | 0.485176 | 0.521329 | 0.494014 |
24585 rows × 6 columns
Voting Ensembling
from keras.models import Sequential
from keras.layers import Conv1D, MaxPooling1D, LSTM, Dense
import numpy as np
# Create a Sequential model
model = Sequential()
# Add a 1D convolutional layer with ReLU activation
model.add(Conv1D(filters=32, kernel_size=(1), activation='relu', input_shape=(1,15)))
# Add a MaxPooling1D layer
model.add(MaxPooling1D(pool_size=1))
# Add an LSTM layer with 64 units and return sequences
model.add(LSTM(units=64, return_sequences=True))
# Add Dense layer for final prediction
model.add(Dense(units=1, activation='relu'))
# Compile the model using Adam optimizer and mean squared error loss
model.compile(optimizer='adam', loss='mean_squared_error')
# Print model summary
model.summary()
# Fit the model to the training data
model.fit(X_pca_train.reshape(24585,1,15), y_train, epochs=10, batch_size=32)
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_1 (Conv1D) (None, 1, 32) 512
max_pooling1d_1 (MaxPooling (None, 1, 32) 0
1D)
lstm_2 (LSTM) (None, 1, 64) 24832
dense_6 (Dense) (None, 1, 1) 65
=================================================================
Total params: 25,409
Trainable params: 25,409
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
769/769 [==============================] - 7s 5ms/step - loss: 0.0102
Epoch 2/10
769/769 [==============================] - 4s 5ms/step - loss: 0.0056
Epoch 3/10
769/769 [==============================] - 4s 6ms/step - loss: 0.0051
Epoch 4/10
769/769 [==============================] - 4s 5ms/step - loss: 0.0049
Epoch 5/10
769/769 [==============================] - 3s 4ms/step - loss: 0.0047
Epoch 6/10
769/769 [==============================] - 4s 6ms/step - loss: 0.0046
Epoch 7/10
769/769 [==============================] - 4s 5ms/step - loss: 0.0044
Epoch 8/10
769/769 [==============================] - 3s 4ms/step - loss: 0.0043
Epoch 9/10
769/769 [==============================] - 4s 5ms/step - loss: 0.0042
Epoch 10/10
769/769 [==============================] - 4s 6ms/step - loss: 0.0042
<keras.callbacks.History at 0x7d47c17205b0>
# Use the trained model to make predictions on the test data
y_pred_CNN_LSTM=model.predict(X_test_reshaped)
330/330 [==============================] - 1s 2ms/step
# Use the trained model to make predictions on the reshaped training data
x=model.predict(X_train_reshaped)
769/769 [==============================] - 2s 2ms/step
# Reshape the predictions from the model into a 1D array
x.reshape(-1)
array([0.21149546, 0.36228505, 0.3161803 , ..., 0.54253733, 0.3554708 ,
0.48823586], dtype=float32)
# The variable X_test_reshaped contains the reshaped test data
X_test_reshaped
array([[[-0.67809883, 3.97624719, -2.06335347, ..., 0.47986341,
0.05772155, 0.18039093]],
[[-1.95231258, 0.43662293, 0.26922138, ..., 0.36639487,
0.12314122, -0.28634634]],
[[-1.15259208, -1.88037732, -0.62508409, ..., -1.26402646,
-1.40102374, -0.76403093]],
...,
[[ 1.64161227, 4.07773423, 2.79488827, ..., -0.94587568,
-0.91457452, -0.8681216 ]],
[[-0.32587695, 0.22216312, -2.23782486, ..., 1.21063252,
0.83198182, 0.34105151]],
[[ 3.81844739, -1.41876669, 0.51154829, ..., 0.4667914 ,
0.21061727, -1.32840204]]])
X_test_reshaped
array([[[-0.67809883, 3.97624719, -2.06335347, ..., 0.47986341,
0.05772155, 0.18039093]],
[[-1.95231258, 0.43662293, 0.26922138, ..., 0.36639487,
0.12314122, -0.28634634]],
[[-1.15259208, -1.88037732, -0.62508409, ..., -1.26402646,
-1.40102374, -0.76403093]],
...,
[[ 1.64161227, 4.07773423, 2.79488827, ..., -0.94587568,
-0.91457452, -0.8681216 ]],
[[-0.32587695, 0.22216312, -2.23782486, ..., 1.21063252,
0.83198182, 0.34105151]],
[[ 3.81844739, -1.41876669, 0.51154829, ..., 0.4667914 ,
0.21061727, -1.32840204]]])
# Create a DataFrame named energy_series with predicted energy values from the CNN LSTM model
energy_series=pd.DataFrame(y_pred_CNN_LSTM.reshape(-1),columns=["Energy_predictions"])
# Add a new column to energy_series containing actual energy values from y_test
energy_series["energy_actual"]=y_test
# Calculate the rolling mean of energy_series over a window of 30 days (30*24 hours)
energy_mean = energy_series.rolling(window=30*24).mean()
# Set the size of the plot to 20x10 inches
# Then, plot the rolling mean of energy_series using a line plot
energy_mean.plot(figsize=(20,10))
# CNN LSTM hybrid model
<Axes: >
# Calculate the mean squared error (MSE) between the actual energy values (y_test) and predictions from the CNN LSTM model
mse = mean_squared_error(y_test, y_pred_CNN_LSTM.reshape(-1))
# Print the calculated mean squared error
print("Mean Squared Error:", mse)
Mean Squared Error: 0.006008444888229008
# Create a DataFrame named energy_series with energy predictions from y_train
energy_series=pd.DataFrame(y_train,columns=["Energy_predictions"])
# Calculate the rolling mean of energy predictions from the DataFrame energy_series
y=energy_series["Energy_predictions"].rolling(window=30*24).mean()[:1024]
y
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
1019 0.453184
1020 0.452730
1021 0.452862
1022 0.452641
1023 0.452862
Name: Energy_predictions, Length: 1024, dtype: float64
energy_series
| Energy_predictions | |
|---|---|
| 0 | 0.235854 |
| 1 | 0.402851 |
| 2 | 0.293170 |
| 3 | 0.615580 |
| 4 | 0.375894 |
| ... | ... |
| 24580 | 0.376023 |
| 24581 | 0.517699 |
| 24582 | 0.497976 |
| 24583 | 0.356860 |
| 24584 | 0.494014 |
24585 rows × 1 columns
len(X_test_reshaped)
10537
y_train
array([[0.23585393],
[0.40285074],
[0.29317027],
...,
[0.49797606],
[0.35685987],
[0.4940143 ]])
# Create a DataFrame named predictions_df containing predictions from different models and actual load values
predictions_df=pd.DataFrame(data=y_pred_RF_train,columns=["Random Forest"])
predictions_df["RNN"]=y_pred_RNN_train.reshape(24585)
predictions_df["MLP"]=y_pred_MLP_train.reshape(24585)
predictions_df["LSTM"]=y_pred_LSTM_train.reshape(24585)
predictions_df["CNN_LSTM"]=x.reshape(24585)
predictions_df["Actual load"]=y_train
predictions_df
| Random Forest | RNN | MLP | LSTM | CNN_LSTM | Actual load | |
|---|---|---|---|---|---|---|
| 0 | 0.507204 | 0.212257 | 0.230389 | 0.188384 | 0.211495 | 0.235854 |
| 1 | 0.315739 | 0.418718 | 0.325366 | 0.316008 | 0.362285 | 0.402851 |
| 2 | 0.418931 | 0.287089 | 0.348004 | 0.308110 | 0.316180 | 0.293170 |
| 3 | 0.487849 | 0.605787 | 0.598816 | 0.562299 | 0.597373 | 0.615580 |
| 4 | 0.418931 | 0.271261 | 0.294836 | 0.285386 | 0.250318 | 0.375894 |
| ... | ... | ... | ... | ... | ... | ... |
| 24580 | 0.317454 | 0.364139 | 0.298728 | 0.282859 | 0.331058 | 0.376023 |
| 24581 | 0.502812 | 0.562295 | 0.534996 | 0.509357 | 0.541966 | 0.517699 |
| 24582 | 0.460907 | 0.468982 | 0.495694 | 0.556511 | 0.542537 | 0.497976 |
| 24583 | 0.418931 | 0.410248 | 0.322879 | 0.331517 | 0.355471 | 0.356860 |
| 24584 | 0.492767 | 0.459145 | 0.454469 | 0.485176 | 0.488236 | 0.494014 |
24585 rows × 6 columns
# Create a DataFrame named predictions_df_test containing predictions from different models, actual load values, and y_actual
predictions_df_test=pd.DataFrame(data=y_pred_RF,columns=["Random Forest"])
predictions_df_test["RNN"]=y_pred_RNN.reshape(10537)
predictions_df_test["MLP"]=y_pred_MLP.reshape(10537)
predictions_df_test["LSTM"]=y_pred_LSTM.reshape(10537)
predictions_df_test["CNN_LSTM"]=y_pred_CNN_LSTM.reshape(10537)
predictions_df_test["y_actual"]=y_test
predictions_df_test
| Random Forest | RNN | MLP | LSTM | CNN_LSTM | y_actual | |
|---|---|---|---|---|---|---|
| 0 | 0.481352 | 0.443365 | 0.574335 | 0.560276 | 0.583084 | 0.570911 |
| 1 | 0.517339 | 0.624323 | 0.547839 | 0.664262 | 0.560572 | 0.567597 |
| 2 | 0.516851 | 0.673315 | 0.677402 | 0.682022 | 0.654718 | 0.605277 |
| 3 | 0.553069 | 0.691995 | 0.653069 | 0.627103 | 0.666148 | 0.600262 |
| 4 | 0.317454 | 0.273155 | 0.278417 | 0.222353 | 0.266471 | 0.162102 |
| ... | ... | ... | ... | ... | ... | ... |
| 10532 | 0.442921 | 0.565241 | 0.482390 | 0.611821 | 0.521983 | 0.579067 |
| 10533 | 0.315739 | 0.249783 | 0.141433 | 0.176199 | 0.165419 | 0.190493 |
| 10534 | 0.488384 | 0.524229 | 0.520902 | 0.569639 | 0.572822 | 0.579546 |
| 10535 | 0.515490 | 0.536594 | 0.581097 | 0.332666 | 0.570930 | 0.444570 |
| 10536 | 0.317454 | 0.264106 | 0.225027 | 0.247447 | 0.272645 | 0.306454 |
10537 rows × 6 columns
ENSEMBLED STACK
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.linear_model import LinearRegression
# Define the parameter grid for hyperparameter search
param_grid = {
'n_jobs': [1,2,3,4,5,6,7,8,9,10]
}
# Create a Linear Regression model
model = LinearRegression()
# Initialize KFold cross-validation with 5 splits
kf = KFold(n_splits=5, shuffle=True, random_state=42)
# Perform GridSearchCV with negative mean squared error as scoring metric
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=kf, scoring='neg_mean_squared_error')
grid_search.fit(predictions_df[["RNN","MLP","LSTM","Random Forest","CNN_LSTM"]], predictions_df[["Actual load"]])
# Get the best model from the grid search
best_params = grid_search.best_params_
# Evaluate the best model on the test set
best_model = grid_search.best_estimator_
test_accuracy = best_model.score(predictions_df_test[["RNN","MLP","LSTM","Random Forest","CNN_LSTM"]], predictions_df_test[["y_actual"]])
# Print the best parameters and test accuracy
print("Best Parameters:", best_params)
print("Test Accuracy:", test_accuracy)
Best Parameters: {'n_jobs': 1}
Test Accuracy: 0.8491999432211498
# Use the best_model to make predictions on the test set using the selected ensemble of models
predictions_df_test["Meta_model_prediction"]=best_model.predict(predictions_df_test[["RNN","MLP","LSTM","Random Forest","CNN_LSTM"]])
predictions_df_test
| Random Forest | RNN | MLP | LSTM | CNN_LSTM | y_actual | Meta_model_prediction | |
|---|---|---|---|---|---|---|---|
| 0 | 0.481352 | 0.443365 | 0.574335 | 0.560276 | 0.583084 | 0.570911 | 0.582810 |
| 1 | 0.517339 | 0.624323 | 0.547839 | 0.664262 | 0.560572 | 0.567597 | 0.581195 |
| 2 | 0.516851 | 0.673315 | 0.677402 | 0.682022 | 0.654718 | 0.605277 | 0.665526 |
| 3 | 0.553069 | 0.691995 | 0.653069 | 0.627103 | 0.666148 | 0.600262 | 0.649373 |
| 4 | 0.317454 | 0.273155 | 0.278417 | 0.222353 | 0.266471 | 0.162102 | 0.265362 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 10532 | 0.442921 | 0.565241 | 0.482390 | 0.611821 | 0.521983 | 0.579067 | 0.536351 |
| 10533 | 0.315739 | 0.249783 | 0.141433 | 0.176199 | 0.165419 | 0.190493 | 0.166835 |
| 10534 | 0.488384 | 0.524229 | 0.520902 | 0.569639 | 0.572822 | 0.579546 | 0.562654 |
| 10535 | 0.515490 | 0.536594 | 0.581097 | 0.332666 | 0.570930 | 0.444570 | 0.512188 |
| 10536 | 0.317454 | 0.264106 | 0.225027 | 0.247447 | 0.272645 | 0.306454 | 0.263070 |
10537 rows × 7 columns
# Calculate the rolling mean of the columns "y_actual" and "Meta_model_prediction" from predictions_df_test
energy_mean = predictions_df_test[["y_actual","Meta_model_prediction"]].rolling(window=30*24).mean()
# Set the size of the plot to 20x15 inches
# Then, plot the rolling mean using a line plot
energy_mean.plot(figsize=(20,15))
# Linear regression ensemble meta model
<Axes: >
# Calculate the mean squared error (MSE) between the actual energy values (y_actual) and ensemble model predictions (Meta_model_prediction)
mse = mean_squared_error(predictions_df_test["y_actual"], predictions_df_test["Meta_model_prediction"])
# Print the calculated mean squared error
print("Mean Squared Error:", mse)
Mean Squared Error: 0.006001151615983355
best_model.coef_
array([[-0.05035801, 0.23785579, 0.26249903, -0.03414418, 0.54491015]])
import statsmodels.api as sm
# Add a constant term to the independent variables
x = sm.add_constant(predictions_df[["RNN","MLP","LSTM","Random Forest","CNN_LSTM"]])
# Fit an Ordinary Least Squares (OLS) model
model = sm.OLS(predictions_df[["Actual load"]],predictions_df[["RNN","MLP","LSTM","Random Forest","CNN_LSTM"]]).fit()
# Print the summary of the OLS model
print(model.summary())
OLS Regression Results
=======================================================================================
Dep. Variable: Actual load R-squared (uncentered): 0.985
Model: OLS Adj. R-squared (uncentered): 0.985
Method: Least Squares F-statistic: 3.173e+05
Date: Tue, 15 Aug 2023 Prob (F-statistic): 0.00
Time: 02:55:23 Log-Likelihood: 33770.
No. Observations: 24585 AIC: -6.753e+04
Df Residuals: 24580 BIC: -6.749e+04
Df Model: 5
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------
RNN -0.0376 0.009 -4.023 0.000 -0.056 -0.019
MLP 0.2186 0.013 17.444 0.000 0.194 0.243
LSTM 0.2637 0.016 16.736 0.000 0.233 0.295
Random Forest 0.0148 0.003 5.140 0.000 0.009 0.020
CNN_LSTM 0.5441 0.015 36.322 0.000 0.515 0.573
==============================================================================
Omnibus: 460.593 Durbin-Watson: 1.986
Prob(Omnibus): 0.000 Jarque-Bera (JB): 819.988
Skew: 0.144 Prob(JB): 8.75e-179
Kurtosis: 3.847 Cond. No. 53.0
==============================================================================
Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
def max_of_last_n(series, n):
max_values = []
for i in range(len(series)):
if i >= n:
max_value = max(series[i - n + 1 : i + 1])
max_values.append(max_value)
else:
max_values.append(None) # Not enough data for the first n indices
return max_values
# Example usage
data_series = [5, 8, 3, 12, 6, 9, 15, 7, 10, 20]
n = 3
result = max_of_last_n(data_series, n)
print(result)
[None, None, None, 12, 12, 12, 15, 15, 15, 20]
VOTING
# Calculate the ensemble prediction by averaging predictions from different models
predictions_df_test["model_predictions_voting"]=(predictions_df_test["Random Forest"]+
predictions_df_test["RNN"]+predictions_df_test["MLP"]+predictions_df_test["LSTM"]+predictions_df_test["CNN_LSTM"])/5
predictions_df_test
| Random Forest | RNN | MLP | LSTM | CNN_LSTM | y_actual | Meta_model_prediction | model_predictions_voting | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0.481352 | 0.443365 | 0.574335 | 0.560276 | 0.583084 | 0.570911 | 0.582810 | 0.528482 |
| 1 | 0.517339 | 0.624323 | 0.547839 | 0.664262 | 0.560572 | 0.567597 | 0.581195 | 0.582867 |
| 2 | 0.516851 | 0.673315 | 0.677402 | 0.682022 | 0.654718 | 0.605277 | 0.665526 | 0.640862 |
| 3 | 0.553069 | 0.691995 | 0.653069 | 0.627103 | 0.666148 | 0.600262 | 0.649373 | 0.638277 |
| 4 | 0.317454 | 0.273155 | 0.278417 | 0.222353 | 0.266471 | 0.162102 | 0.265362 | 0.271570 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 10532 | 0.442921 | 0.565241 | 0.482390 | 0.611821 | 0.521983 | 0.579067 | 0.536351 | 0.524871 |
| 10533 | 0.315739 | 0.249783 | 0.141433 | 0.176199 | 0.165419 | 0.190493 | 0.166835 | 0.209715 |
| 10534 | 0.488384 | 0.524229 | 0.520902 | 0.569639 | 0.572822 | 0.579546 | 0.562654 | 0.535195 |
| 10535 | 0.515490 | 0.536594 | 0.581097 | 0.332666 | 0.570930 | 0.444570 | 0.512188 | 0.507355 |
| 10536 | 0.317454 | 0.264106 | 0.225027 | 0.247447 | 0.272645 | 0.306454 | 0.263070 | 0.265336 |
10537 rows × 8 columns
# Calculate the rolling mean of the columns "y_actual," "Meta_model_prediction," and "model_predictions_voting" from predictions_df_test
energy_mean = predictions_df_test[["y_actual","Meta_model_prediction","model_predictions_voting"]].rolling(window=30*24).mean()
# Set the size of the plot to 20x15 inches
# Then, plot the rolling mean using a line plot
energy_mean.plot(figsize=(20,15))
<Axes: >
from sklearn.metrics import mean_absolute_error
print("Model1 MSE "+str(mean_absolute_error(predictions_df_test["y_actual"],predictions_df_test["Meta_model_prediction"])))
Model1 MSE 0.06060945614065426
!jupyter nbconvert --to html datamining_final_project_ensemble_Kfold_validation_scalecorrelection(2).ipynb
/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: `jupyter nbconvert --to html datamining_final_project_ensemble_Kfold_validation_scalecorrelection(2).ipynb'
!jupyter nbconvert --to html datamining_final_project_ensemble_Kfold_validation_scalecorrelection v.ipynb
[NbConvertApp] WARNING | pattern 'datamining_final_project_ensemble_Kfold_validation_scalecorrelection' matched no files
[NbConvertApp] WARNING | pattern 'v.ipynb' matched no files
This application is used to convert notebook files (*.ipynb)
to various other formats.
WARNING: THE COMMANDLINE INTERFACE MAY CHANGE IN FUTURE RELEASES.
Options
=======
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
<cmd> --help-all
--debug
set log level to logging.DEBUG (maximize logging output)
Equivalent to: [--Application.log_level=10]
--show-config
Show the application's configuration (human-readable format)
Equivalent to: [--Application.show_config=True]
--show-config-json
Show the application's configuration (json format)
Equivalent to: [--Application.show_config_json=True]
--generate-config
generate default config file
Equivalent to: [--JupyterApp.generate_config=True]
-y
Answer yes to any questions instead of prompting.
Equivalent to: [--JupyterApp.answer_yes=True]
--execute
Execute the notebook prior to export.
Equivalent to: [--ExecutePreprocessor.enabled=True]
--allow-errors
Continue notebook execution even if one of the cells throws an error and include the error message in the cell output (the default behaviour is to abort conversion). This flag is only relevant if '--execute' was specified, too.
Equivalent to: [--ExecutePreprocessor.allow_errors=True]
--stdin
read a single notebook file from stdin. Write the resulting notebook with default basename 'notebook.*'
Equivalent to: [--NbConvertApp.from_stdin=True]
--stdout
Write notebook output to stdout instead of files.
Equivalent to: [--NbConvertApp.writer_class=StdoutWriter]
--inplace
Run nbconvert in place, overwriting the existing notebook (only
relevant when converting to notebook format)
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory=]
--clear-output
Clear output of current file and save in place,
overwriting the existing notebook.
Equivalent to: [--NbConvertApp.use_output_suffix=False --NbConvertApp.export_format=notebook --FilesWriter.build_directory= --ClearOutputPreprocessor.enabled=True]
--no-prompt
Exclude input and output prompts from converted document.
Equivalent to: [--TemplateExporter.exclude_input_prompt=True --TemplateExporter.exclude_output_prompt=True]
--no-input
Exclude input cells and output prompts from converted document.
This mode is ideal for generating code-free reports.
Equivalent to: [--TemplateExporter.exclude_output_prompt=True --TemplateExporter.exclude_input=True --TemplateExporter.exclude_input_prompt=True]
--allow-chromium-download
Whether to allow downloading chromium if no suitable version is found on the system.
Equivalent to: [--WebPDFExporter.allow_chromium_download=True]
--disable-chromium-sandbox
Disable chromium security sandbox when converting to PDF..
Equivalent to: [--WebPDFExporter.disable_sandbox=True]
--show-input
Shows code input. This flag is only useful for dejavu users.
Equivalent to: [--TemplateExporter.exclude_input=False]
--embed-images
Embed the images as base64 dataurls in the output. This flag is only useful for the HTML/WebPDF/Slides exports.
Equivalent to: [--HTMLExporter.embed_images=True]
--sanitize-html
Whether the HTML in Markdown cells and cell outputs should be sanitized..
Equivalent to: [--HTMLExporter.sanitize_html=True]
--log-level=<Enum>
Set the log level by value or name.
Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
Default: 30
Equivalent to: [--Application.log_level]
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
--to=<Unicode>
The export format to be used, either one of the built-in formats
['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides', 'webpdf']
or a dotted object name that represents the import path for an
``Exporter`` class
Default: ''
Equivalent to: [--NbConvertApp.export_format]
--template=<Unicode>
Name of the template to use
Default: ''
Equivalent to: [--TemplateExporter.template_name]
--template-file=<Unicode>
Name of the template file to use
Default: None
Equivalent to: [--TemplateExporter.template_file]
--theme=<Unicode>
Template specific theme(e.g. the name of a JupyterLab CSS theme distributed
as prebuilt extension for the lab template)
Default: 'light'
Equivalent to: [--HTMLExporter.theme]
--sanitize_html=<Bool>
Whether the HTML in Markdown cells and cell outputs should be sanitized.This
should be set to True by nbviewer or similar tools.
Default: False
Equivalent to: [--HTMLExporter.sanitize_html]
--writer=<DottedObjectName>
Writer class used to write the
results of the conversion
Default: 'FilesWriter'
Equivalent to: [--NbConvertApp.writer_class]
--post=<DottedOrNone>
PostProcessor class used to write the
results of the conversion
Default: ''
Equivalent to: [--NbConvertApp.postprocessor_class]
--output=<Unicode>
overwrite base name use for output files.
can only be used when converting one notebook at a time.
Default: ''
Equivalent to: [--NbConvertApp.output_base]
--output-dir=<Unicode>
Directory to write output(s) to. Defaults
to output to the directory of each notebook. To recover
previous default behaviour (outputting to the current
working directory) use . as the flag value.
Default: ''
Equivalent to: [--FilesWriter.build_directory]
--reveal-prefix=<Unicode>
The URL prefix for reveal.js (version 3.x).
This defaults to the reveal CDN, but can be any url pointing to a copy
of reveal.js.
For speaker notes to work, this must be a relative path to a local
copy of reveal.js: e.g., "reveal.js".
If a relative path is given, it must be a subdirectory of the
current directory (from which the server is run).
See the usage documentation
(https://nbconvert.readthedocs.io/en/latest/usage.html#reveal-js-html-slideshow)
for more details.
Default: ''
Equivalent to: [--SlidesExporter.reveal_url_prefix]
--nbformat=<Enum>
The nbformat version to write.
Use this to downgrade notebooks.
Choices: any of [1, 2, 3, 4]
Default: 4
Equivalent to: [--NotebookExporter.nbformat_version]
Examples
--------
The simplest way to use nbconvert is
> jupyter nbconvert mynotebook.ipynb --to html
Options include ['asciidoc', 'custom', 'html', 'latex', 'markdown', 'notebook', 'pdf', 'python', 'rst', 'script', 'slides', 'webpdf'].
> jupyter nbconvert --to latex mynotebook.ipynb
Both HTML and LaTeX support multiple output templates. LaTeX includes
'base', 'article' and 'report'. HTML includes 'basic', 'lab' and
'classic'. You can specify the flavor of the format used.
> jupyter nbconvert --to html --template lab mynotebook.ipynb
You can also pipe the output to stdout, rather than a file
> jupyter nbconvert mynotebook.ipynb --stdout
PDF is generated via latex
> jupyter nbconvert mynotebook.ipynb --to pdf
You can get (and serve) a Reveal.js-powered slideshow
> jupyter nbconvert myslides.ipynb --to slides --post serve
Multiple notebooks can be given at the command line in a couple of
different ways:
> jupyter nbconvert notebook*.ipynb
> jupyter nbconvert notebook1.ipynb notebook2.ipynb
or you can specify the notebooks list in a config file, containing::
c.NbConvertApp.notebooks = ["my_notebook.ipynb"]
> jupyter nbconvert --config mycfg.py
To see all available configurables, use `--help-all`.